Clustering over High-Dimensional Data Streams Based on Grid Density and Effective Dimension
نویسندگان
چکیده
Clustering algorithm based on grid and density has many excellent features. But for the highdimensional data stream, the number of grids will be increased sharply as the space dimensionality grows. To solve the defect, we propose GDH-Stream, a clustering method based on the effective dimension and grid density for high-dimensional data stream, which consists of an online component and an offline component. First, we define the effective value of dimension and give the partition method of the projection intervals for each dimension. Then the effective values of dimensions, which will be ranked in decreasing order, can be calculated. After that, the subset of dimensions will be chosen to generate subspace. On the online component, with the arrival of the data stream, GDHStream will map each data to the original grid structure. On the offline component, when a clustering request arrives, the subspace will be generated by the effective values of dimensions. Then the original grid structure is projected to the subspace and the new grid structure will be formed. Moreover, the clustering will be performed on the new grid structure according to the connection of the density grids. Experimental results show that the GDH-Stream has better clustering quality and efficiency. Meanwhile, GDH-Stream has strong scalability for clustering data stream.
منابع مشابه
DENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window
Evolving data streams are ubiquitous. Various clustering algorithms have been developed to extract useful knowledge from evolving data streams in real time. Density-based clustering method has the ability to handle outliers and discover arbitrary shape clusters whereas grid-based clustering has high speed processing time. Sliding window is a widely used model for data stream mining due to its e...
متن کاملAn Adaptive Grid-based Method for Clustering Multi- Dimensional Online Data Streams
Clustering is an important task in mining the evolving data streams. A lot of data streams are high dimensional in nature. Clustering in the high dimensional data space is a complex problem, which is inherently more complex for data streams. Most data stream clustering methods are not capable of dealing with high dimensional data streams; therefore they sacrifice the accuracy of clusters. In or...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملResearch on Cluster Analysis of High Dimensional Space Based on Fuzzy Extension
Traditional spatial data are generally high dimensional features, and in the clustering of high dimensional data can be directly applied to data processing because of Dimension effect and the data sparseness problem. For CLIQUE algorithm, which usually have the problem such as prone to non-axis direction of overclustering, boundary judgment of fuzzy clustering and smoothing clustering. In this ...
متن کاملAdjustable Probability Density Grid-Based Clustering for Uncertain Data Streams
Most existing traditional grid-based clustering algorithms for uncertain data streams that used the fixed meshing method have the disadvantage of low clustering accuracy. In view of above deficiencies, this paper proposes a novel algorithm APDG-CUStream, Adjustable Probability Density Grid-based Clustering for Uncertain Data Streams, which adopts the online component and offline component. In o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011